knitr::opts_chunk$set(echo = TRUE)
Here we have scRNA-Seq data for three different treatments or conditions; namely, the virus life cycle stage of HSV1 infections: “acute”, “latent”, and “reactivation”.
Previously we looked at using Seurat to cluster and visualize scRNA-seq data, as well as identify marker genes that define the clusters and identify genes that vary between life cycle stage.
Here we will take a look at another popular R porgram for scRNA-seq anlaysis: Monocle version 3
First we will load the scRNA-seq data and then perform prepreocessing, dimenionsality reduction (again using UMAP), and finally look at generating a gene expression trajectory and seeing how gene expression changes over said trajectory.
Load the R libraries, genes text file and the expression data (expression data is currently stored as a seurat RDS oobject).
suppressPackageStartupMessages(
{
library(ggplot2)
library(Seurat)
library(monocle)
library(monocle3)
library(ggrepel)
}
)
viral.genes <- read.table(file="/slipstream/home/mmariani/projects/tutorials/monocle3/viral.genes.txt",
header=FALSE,
stringsAsFactors = FALSE,
sep="\t")$V1
infected.object <- readRDS(file = "/slipstream/home/mmariani/projects/tutorials/monocle3/seurat.object2.rds")
paste0(length(Seurat::Cells(infected.object)), " total cells in object, ~5e3 cells per virus life cycle stage")
Monocle3 has its own functionality for loading scRNA-seq data, in this example we have a pre-existing seurat object that we will convert into a monocle3 object. Note that it is more direct when possible, and probably better practice, to load data directly into monocle3, in this case we have a Seurat object so we wil need to convert it.
data <- as(as.matrix(infected.object@assays$RNA@counts), 'sparseMatrix')
##rownames(data)
##colnames(data)
##length(colnames(data))
infected.data <- as(as.matrix(infected.object@assays$RNA@counts), 'sparseMatrix')
##rownames(data)
##colnames(data)
pData <- infected.object@meta.data
pData$cell <- rownames(pData)
fData <- data.frame(id = row.names(data), gene_short_name = row.names(data))
rownames(fData) <- rownames(data)
infected <- monocle3::new_cell_data_set(expression_data=infected.data,
cell_metadata = pData,
gene_metadata = fData)
##ncol(infected)
##unique(infected@colData$orig.ident)
##Preprocessing
infected <- monocle3::preprocess_cds(infected, num_dim = 100)
##Variance explained
Check the “variance explained plot” to make sure you are capturing enough dimensions, we can see that from the plot, 100, as specified above, is more than adequates
pc_variance_explained <- plot_pc_variance_explained(infected)
pc_variance_explained
##Dimensionality Reduction
infected <- monocle3::reduce_dimension(infected,
max_components = 2,
cores=8,
verbose=FALSE)
infected <- monocle3::cluster_cells(infected,
verbose=FALSE)
infected <- monocle3::learn_graph(infected, verbose=FALSE, use_partition=TRUE)
monocle3::plot_cells(infected,
color_cells_by = "orig.ident",
label_cell_groups=TRUE,
label_leaves=TRUE,
label_branch_points=TRUE,
graph_label_size=3) +
theme(legend.position = "right")
##Overall viral gene expression
Look at the viral gene expression across all cells grouped by virus life cycle stage. May need to store the plot as an R variable and use ggsave to save it with large enough size parameters for adequate inspection.
monocle3::plot_genes_violin(subset(infected,rownames(infected) %in% viral.genes),
group_cells_by = "orig.ident",
ncol=6)
##Differential gene expression analysis
Here we can use the graph_test autocorrelation test to see which genes vary across the UMAP/partitions calculated above by Seurat. Below we get the genes with q_value < 0.05
pr_graph_test_res <- graph_test(infected, neighbor_graph="knn", cores=8)
pr_deg_ids <- row.names(subset(pr_graph_test_res, q_value < 0.05))
Looking at the above plot choose starting nodes for our trajectorie(s). Load
#root_pr_nodes can be found in infected@principal_graph_aux$UMAP$root_pr_nodes , after
#manual selection. Here I have specified them explicitly after choosing them manually
root_pr_nodes_chosen <- c(
"Y_8",
"Y_10",
"Y_15",
"Y_53",
"Y_56",
"Y_65",
"Y_71",
"Y_74",
"Y_155",
"Y_158",
"Y_176",
"Y_177",
"Y_179",
"Y_181",
"Y_182",
"Y_183",
"Y_185",
"Y_189",
"Y_190",
"Y_191",
"Y_193",
"Y_194",
"Y_195",
"Y_197",
"Y_199",
"Y_203",
"Y_208",
"Y_209",
"Y_211",
"Y_212",
"Y_222")
infected <- monocle3::order_cells(infected,
reduction_method = "UMAP",
root_pr_nodes = root_pr_nodes_chosen,
verbose=FALSE)
##Trajectory visualization
Now let’s look at our trajectory based off of our selected starting nodes from above - note that the partitions will change depending on which starting nodes that you choose.
monocle3::plot_cells(infected,
color_cells_by = "orig.ident",
label_cell_groups=TRUE,
label_leaves=TRUE,
label_branch_points=TRUE,
graph_label_size=3) +
theme(legend.position = "right")
monocle3::plot_cells(infected,
color_cells_by = "partition",
label_cell_groups=TRUE,
label_leaves=TRUE,
label_branch_points=TRUE,
graph_label_size=3) +
theme(legend.position = "right")
monocle3::plot_cells(infected,
color_cells_by = "seurat_clusters",
label_cell_groups=TRUE,
label_leaves=TRUE,
label_branch_points=TRUE,
graph_label_size=3) +
theme(legend.position = "right")
monocle3::plot_cells(infected,
color_cells_by = "pseudotime",
label_cell_groups=FALSE,
label_leaves=TRUE,
label_branch_points=TRUE,
graph_label_size=3) +
theme(legend.position = "right")
Find virus genes across that vary across our Trajectory (note this is not identical to finding genes that vary across the umap clusters/partitions). For a large number of genes you may want to store the plot object as a variable and use ggsave to save a large .pdf or .jpg for easier inspection.
First we will subset out the virus genes from the Monocle3 CDS object (“infected”)
virus_lineage_cds <- infected[rowData(infected)$gene_short_name %in% viral.genes,]
pseudotime plot with the overlain expression trajectory, colored by the orignal cell identities
plot_genes_in_pseudotime(virus_lineage_cds,
color_cells_by="old.ident",
min_expr=0.5)
pseudotime plot with the overlain expression trajectory, colored by the clusters formerly identified using Seurat
plot_genes_in_pseudotime(virus_lineage_cds,
color_cells_by="seurat_clusters",
min_expr=0.5)